10 research outputs found
Integrating Syntactic and Prosodic Information for the Efficient Detection of Empty Categories
We describe a number of experiments that demonstrate the usefulness of
prosodic information for a processing module which parses spoken utterances
with a feature-based grammar employing empty categories. We show that by
requiring certain prosodic properties from those positions in the input where
the presence of an empty category has to be hypothesized, a derivation can be
accomplished more efficiently. The approach has been implemented in the machine
translation project VERBMOBIL and results in a significant reduction of the
work-load for the parser.Comment: To appear in the Proceedings of Coling 1996, Copenhagen. 6 page
The Treegram Index -- An Efficient Technique for Retrieval in Linguistic Treebanks
We present a generalization of the classical n-gram indexing technique called Treegram indexing, which is applied by the Venona retrieval system
The Treegram Index - An Efficient Technique for Retrieval in Linguistic Treebanks
In computational linguistics, large tree databases tagged with morpho-syntactic information are in need of fast retrieval of multiway tree structures. To tackle this problem, we present a generalization of the classical n-gram indexing technique called Treegram indexing. As an application of treegram indexing, we describe the Venona retrieval system, which handles the BH t treebank containing 508,650 phrase structure trees. 1 Tree Retrieval Multiway trees (MT, henceforth) play a central role in representing complex linguistic information because they are a common and well-understood data structure for describing hierarchical information. With the availability of large treebanks, retrieval techniques for highly structured data now become essential. One of the most well-known linguistic tree repositories is the Penn treebank of the University of Pennsylvania: Its fundament consists of a corpus containing 4.5 million words of American English; half of this corpus has been annotated for sk..